Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRU on top of ELMo embedding layer #12

Open
TharinduDR opened this issue Feb 11, 2019 · 3 comments
Open

GRU on top of ELMo embedding layer #12

TharinduDR opened this issue Feb 11, 2019 · 3 comments

Comments

@TharinduDR
Copy link

TharinduDR commented Feb 11, 2019

Hi,
I replaced my embedding layer with the ELMo embedding layer. The code looks like this -

` embedding_layer = ElmoEmbeddingLayer()

# Embedded version of the inputs
encoded_left = embedding_layer(left_input)
encoded_right = embedding_layer(right_input)

# Since this is a siamese network, both sides share the same GRU
shared_gru = GRU(n_hidden, name='gru')

left_output = shared_gru(encoded_left)
right_output = shared_gru(encoded_right)`

But I am running to error - Input 0 is incompatible with layer gru: expected ndim=3, found ndim=2. The architecture worked well with the default embedding layer. Any idea what am I doing wrong?

@hambro
Copy link

hambro commented Mar 7, 2019

I think it is beacuse the ElmoEmbeddingLayer uses the default-argmuent of Elmo (see the def call() function), resulting in a fixed mean-pooling of all contextualized word representations with shape [batch_size, 1024].
This is documented in the Output section of the TensorFlow model.

signature='default' means the input is whole sentences, and that the model should perform the splitting. The interesting part is the ['default']-dict lookup, which should be changed to ['elmo'].
This should return a tensor with the shape [batch_size, max_length, 1024].
I've also changed compute_output_shape to return (None, None, 1024), since we don't know how long each sequence is.
However, I still get some errors. Here is my updated and not working layer:

class ElmoEmbeddingLayer(Layer):
    def __init__(self, **kwargs):
        self.dimensions = 1024
        self.trainable=True
        super(ElmoEmbeddingLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,
                               name="{}_module".format(self.name))

        self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
        super(ElmoEmbeddingLayer, self).build(input_shape)

    def call(self, x, mask=None):
        result = self.elmo(
            K.squeeze(
                K.cast(x, tf.string), axis=1
            ),
            as_dict=True,
            signature='default',
            )['elmo']
        return result

    def compute_mask(self, inputs, mask=None):
        return K.not_equal(inputs, '--PAD--')

    def compute_output_shape(self, input_shape):
        return (input_shape[0], None, self.dimensions)

@jacobzweig
Copy link
Contributor

Thanks @hambro – I had sketched out a working prototype a while back – I'll see if I can dig something up and get it up here when I get a bit of free time.

@yjiang18
Copy link

yjiang18 commented Sep 6, 2019

I modified the elmo embedding layer as follow, and the output_shape is [batch_size, seq_len, 1024] now, and you can put LSTM/GRU on top of it:

class ElmoEmbeddingLayer(Layer):

def __init__(self, mask, **kwargs):
    self.dimensions = 1024
    self.trainable = True
    self.mask = mask
    super(ElmoEmbeddingLayer, self).__init__(**kwargs)

def build(self, input_shape):
        self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,
                               name="{}_module".format(self.name))
        self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
        super(ElmoEmbeddingLayer, self).build(input_shape)

def call(self, inputs, mask=None):
        # inputs.shape = [batch_size, seq_len]
        seq_len = [inputs.shape[1]] * inputs.shape[0] # this will give a list of seq_len: [seq_len, seq_len, ..., seq_len] just like the official example.
        result = self.elmo(inputs={"tokens": K.cast(inputs, dtype=tf.string),
                                   "sequence_len": seq_len},
                      as_dict=True,
                      signature='tokens',
                      )['elmo']
     
        return result

def compute_mask(self, inputs, mask=None):
        if not self.mask:
            return None

        output_mask = K.not_equal(inputs, '--PAD--')
        return output_mask
def compute_output_shape(self, input_shape):
        return (input_shape[0], input_shape[1], self.dimensions)`

The only problem is you have to explicitly give batch size to the Input Layer, say,
you might have input layer like this text_input =Input(shape=(sequence_length,), dtype=tf.string), now you have to modify it to:
text_input =Input(batch_shape=(batch_size, sequence_length), dtype=tf.string).

This might cause the last batch might be smaller than the batch_size, and rise an error when the model reaches the end of epoch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants